The goal of maximum likelihood is to fit a distribution to some data.
Using the Bayes Theorem, we want to find the most likely value for the parametetrs of our model, given the data.
The likelyhood is equal to the probability density function of a gaussian, if we assume that the data was generated by a gaussian distribution.
We basically bruteforce fit a gaussian distribution on the data and then we get the one that maximizes the likelyhood function.
So we begin by fitting the gaussians, we start with
Which yields a likelyhood of:
But we can do better right?
In fact, if we plug in
We get a likelyhood of 0.12, which is definitely better!
By the way, if we plot the likelyhood hovering all over the possible values of
If we have multiple data points, the likelihood function will be the product of all the gaussians/individual likelyhood functions that are generated from the data points.
So we take the derivative of this shit with respect to
In order to get the maximum likelihood parameters for multiple data points, we must multiply all the individual likelihood functions and take the derivative of that, solving for